{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "In this chapter, we will learn how to use the pre-trained BERT model in detail. First, we will look at the different configurations of the pre-trained BERT model open-sourced by Google. Then we will learn how to use the pre-trained BERT as a feature extractor. We will also explore Hugging Face's transformers library and learn how to use it for extracting embeddings from pre-trained BERT. \n",
    "\n",
    "Moving on, we will understand how to extract embeddings from all encoder layers of BERT. Next, we will learn how to finetune the pre-trained BERT for the downstream tasks. First, we will learn to finetune the pre-trained BERT for a text classification task. Next, we will learn to implement finetuning BERT for sentiment analysis tasks using the transformers library. Then we will look into finetuning pre-trained BERT for natural language inference, question answering task, and named entity recognition tasks. \n",
    "\n",
    "In this chapter, we will learn the following topics:\n",
    "\n",
    "- Pre-trained BERT model \n",
    "- Extracting embeddings from pre-trained BERT\n",
    "- Extracting embeddings from all encoder layers of BERT \n",
    "- Finetuning BERT for downstream tasks \n",
    "\n",
    "\n",
    "# Pre-trained BERT model\n",
    "In Chapter 3, Understanding a BERT Model, we learned how to pre-train the BERT using masked language modeling and next sentence prediction. But pretraining the BERT from scratch is computationally expensive. So, we can download the pre-trained BERT model and use it. Google has open-sourced the pre-trained BERT model and we can download it from Google Research's BERT GitHub repository - https://github.com/google-research/bert. They have released the pre-trained BERT with various configurations as shown in the following figure.  denotes the number of encoder layers and  denotes the size of the hidden unit (representation size): \n",
    "\n",
    "\n",
    "![title](images/1.png)\n",
    "\n",
    "The pre-trained model is also available in the BERT-uncased and BERT-cased format. In the BERT-uncased, all the tokens are lowercased but in the BERT-cased, the tokens are not lowercased and used directly for training. Okay, which pre-trained BERT model we should use? BERT-cased or BERT-uncased? BERT-uncased model is the one that is most commonly used but if we are working on specific tasks like name entity recognition where we have to preserve the case then in that premise we can use the BERT-cased model. Along with these, Google also released the pre-trained BERT models which are trained using the whole word masking method. \n",
    "\n",
    "Okay, but how exactly we can use the pre-trained BERT model? We can use the pre-trained model in the following two ways:\n",
    "\n",
    "- As a feature extractor by extracting embeddings \n",
    "- By fine-tuning the pre-trained BERT model on downstream tasks like text classification, question-answering, and more\n",
    "\n",
    "In the upcoming sections, we will understand how to use the pre-trained BERT model as a feature extractor by extracting embeddings and we will also learn how to finetune the pre-trained BERT model for downstream tasks in detail. \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
